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(54) Abstract Title 

Generation system for audio, video or a combination thereof where metadata Ss generated and stored or 
recorded with the audio/video signal 

<57) An audio and/or video generation apparatus which Is arranged in operation to generate an audio signa l r 
video signal or combination of the two, representative of an audio/video source, the audio/video generation 
apparatus having recording means to record the audio/video signal on to a recording medium (126), the 
apparatus is arranged to receive metadata associated with the audio/video signals that has been generated by 
a data processor (128), the metadata being recorded on the recording medium with the audio/video signal. 
The data processor may be arranged to receive signals representative of the time codes of the recorded 
audio/video signals, and the metadata may include the time code data representative of the in and out points 
of a take of the audio/video signals, generated by the data processor. The metadata may also include a unique 
identification code for identifying the audio/video signal. The unique identification code may be a UMID orthe 
like. There is also a dependant claim for a computer program. 

-151 



3B 



132 



140 



14! 



128 



F1G.2 



T" 

130 



-148 



122 



124 




-^126 




-118 



BEST AVAILABLE COPY 



O 
UD 

CO 



ov 
o 

00 



1/13 





2/13 




3/13 




FIG. 4 



4/13 



152 



5/13 













*\ 

CO 

21 








CO 


L 


CD 




to 


f 






6/13 




CO 

2 -> UJ t 

n ll- Q- < 

■BKgJ 



/ 
/ 



LU 
CO 

< 

CD 

I 
£ 



A" 



i 



I— II 



\ 




CM 




8/13 




9/13 




11/13 




12/13 



UJ 
—J 
ZD 
Q 
LU 
X 

o 

CO 



a 
q: 
a 

CO 




O 
CO 



a: 
uj 

CO 

< 



D 

a: 
< 
O 
m 
> 

IT 
CO 



LU 
-J 

3 

a 

UJ 
X 

o 

CO 



a 

o 

CO 



LU 
—J 
ZD 

UJ 
I 

a 

C/} 



2 
g 

o 
O 



CO 
LU 

CD 
2 



< 



oo 
CD 





< 




cr 




Lit 








< 







CO 
LU 













: 

: 5: 











LU 
—J 

o 
a: 



13/13 



CN 
to 



< 

a 

g 

UJ 

or 



2 

GO 



CD 



< 
UJ 



u. 



CO 



UJ 

o 

Z 

g 

z 



CM 
CD 

T— 

CO 



LU 



CO 
UJ 



CO 

CN 



< 
O 

s 

LU 

or 
=> 

CD 
CO 



CO 
LU 



CD 
CD 

Q 

O 
UJ 

o 
z 

LU 



to 

LU 



CD 

CM 
CO 



=5 

O 
CO 

3 



LU 

CO 



a 

d: 
O 



Oh 
a z 



oo 



co 

LU 
Z 

a 

9 
o 
o 



LU 

D 

UJ 

5 



EC 
LU 
CD 

3 
Z 
_J 
< 

LU 

Sc 



UJ 
CD 



< 
CO 

cc 

LU 
> 

z 



CO 
LU 

CD 

CO 
LU 

CD 

CO 
LU 

> 

CD 



CO 
UJ 



m 

CM 



CO 
UJ 

a: 

— 3 

h- 

O 

GO 

ID Q- 

CD 

CD 



CO 

a 



o 

LU 

Q 

LU 
H- 
X 
UJ 

o 

CO 2 
LU < 

£ s 

CO CO 
CD 



Lil 



CO 
LU 



£0 

CN 



1 

AUDTO AND/OR VIDEO GENERATION APPARATUS AND METHOD OF 
GENERATING AUDIO AND/OR V IDEO SIGNALS 

Field of the Invention 

The present invention relates to audio and/or video generation apparatus and 
5 methods of generating audio and/or video signals. The present invention also relates to 
methods of generating audio and/or video signals. 
Background of the Invention 

The subject matter and content of audio and video productions varies greatly. 
In addition to this variety there is, correspondingly, a considerable quantity of such 
10 audio and video productions. The audio productions include, for example, radio 
broadcasts, both live and pre-recorded, musical and audio recordings, whereas video 
productions include, for example, films, television programs and video recordings. As 
will be appreciated typically video productions also include an accompanying sound 
track or commentary, so that an audio production is inherently included as part of the 

15 video production. 

The term audio and/or video will be used herein to refer to any from of audio 
information or signals, video information or signals, or a combination of video and 
audio information or signals. The term audio/video will be used for short to refer to 
audio and/or video, 

20 As a result of the great variety and quantity of audio/video productions, the task 

of locating particular content items of audio/video material within an archive of 
audio/video productions represents an arduous and labour intensive task, because an 
operator must visually search the audio/video productions for the desired content item. 
Furthermore, because of the length of audio/video productions which are typically 
25 although not exclusively stored on linear recording media, the task of navigating 

: ; through the- media- to ;lo cate particular- content -items .of audio/video material.from an _ 

audio/video productions .time consuming andjabgu^^ 



In our co-pending UK patent application number GB 9921235.9 there is 
disclosed a method and apparatus for navigating through the content of audio/video 
material using metadata which represents the content of the audio/video material. 

The term metadata as used herein refers to and includes any form of 
5 information or data which serves to describe either the content of audio/video material 
• or parameters present or used to generate the audio/video material or any other 
information associated with the audio/video material. Metadata may be, for example, 
•'semantic metadata" which provides contextual/descriptive information about the 
actual content of the audio/video material. Examples of semantic metadata are the 

10 start of periods of dialogue, changes in a scene, introduction of new faces or face 
positions within a scene or any other items associated with the source content of the 
audio/video material. The metadata may also be syntactic metadata which is associated 
with items of equipment or parameters which were used whilst generating the 
audio/video material such as, for example, an amount of zoom applied to a camera 

15 lens, an aperture and shutter speed setting of the lens, and a time, and date when the 
audio/video material was generated. Although metadata, may be recorded with the 
audio/video material with which it is associated, either on separate parts of a recording 
medium or on common parts of a recording medium, metadata in the sense used herein 
is intended for use in navigating and identifying features and essence of the content of 

20 the audio/video material, and may, therefore be separated from the audio/video signals 
when the audio/video signals are reproduced. The metadata is therefore separable from 

the audio/video signals. 

The apparatus and method for navigating through the content of audio/video 
material disclosed in the co-pending UK patent application number GB 9921235.9 
25 uses the metadata which has been generated with the audio/video signals to navigate 
through the items of contextual or essence information of the audio/video material. 

In a further co-pending UK patent application numbei ■ 99212342 Aere js 
"disclosed ah editing system for editing source content such as audio/video material to 
produce an edited audio/video production by applying a template representative of a 
30 desired production style to metadata associated with the audio/video material to form 
the production. 
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Summary of the Invention 

According to the present invention there is provided an audio and/or video 
generation apparatus which is arranged in operation to generate audio and/or video 
signals representative of an audio and/or video source, the audio and/or video 
5 generation apparatus comprising a recording means which is arranged in operation to 
record the audio and/or video signals on a recording medium, wherein the audio and/or 
video generation apparatus is arranged to receive metadata associated with the audio 
and/or video signals generated by a data processor, the recording means being arranged 
in operation to record the metadata on the recording medium with the audio and/or 
1 0 video signals. 

As discussed above there is a great variety in the nature and content of 
audio/video productions Although it is known to associate metadata with audio/video 
productions for facilitating asset management for archiving the audio/video 
productions, as indicated in our co-pending patent applications mentioned above, the 

15 present invention recognises that metadata can be used for facilitating the creation of 
the audio/video productions by editing and navigating through the content of the 
audio/video material. 

An improvement in the creation of audio/video productions is achieved by 
providing an audio/video generation apparatus, which generates metadata and stores- 

20 the metadata with the audio/video signals on a recording medium. As such the 
metadata which describes the content of the audio/video signals can be read from the 
recording medium separately or in parallel, and so provides an indication of the content 
of these audio/video signals without having to reproduce these signals. Generating 
metadata which describes the content of the audio/video material, and recording the 

25 metadata with audio/video signals on the recording medium provides a particular 
advantage when the audio/video signals are edited to form an audio/video production. 

...... This is because, the., audio/video., signals may be ^selectively ^reproduced from the 

recording medium in accordance _with the metadata describing the content of the 
audio/video signals without reproducing and viewing the signals themselves which is 
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time consuming and labour intensive. As such the efficiency of the editing process is 

correspondingly improved. 

A further improvement is provided wherein the data processor is arranged to 
delect signals representative of a time code of the recorded audio/video signals and the 
metadata includes time code data representative of the in and out points of a take of the 
audio/video signals generated by said data processor. By recording metadata with the 
audio/video signals which provides the time codes of the in and out points of the take 
of forming part of the audio/video signals, the individual content items of the 
audio/video signals may be identified for editing. 

An audio/video generation apparatus which is arranged to receive metadata 
generated by a data processor is provided with an improved facility for introducing 
metadata associated with audio/video signals generated by the audio/video apparatus. 
The data processor may form part of the audio/video generation apparatus or the data 
processor may be separate therefrom. 
15 The audio/video generation apparatus may be provided with a user interface 

having a predetermined format for connecting the audio and/or video generation 
apparatus to the data processor. The interface therefore provides a facility for the data 
processor to be connected to the audio and/or video generation apparatus using the 
interface. The predetermined format may be of a common type thereby providing a 
facility for a range of possible data processors to be connected to the audio/video 
generation apparatus. As such, the data processor provides a facility for a user to 
generate metadata and for including this metadata with the audio and/or video signals 
generated by the audio/video generation apparatus. The metadata may be recorded 
separately on the recording medium, from the audio and/or video signals. 
25 In preferred embodiments, the interface may provide a facility for receiving 

signals from the audio/video generation apparatus. The signals may be representative 
of the time code present on the recording , mediunx .As .such the coprocessor may be 
"' ""arranged tn operation to redeive signals representative of the time code of the recorded 
signals via the interface and to generate said metadata. 

According to an aspect of the present invention, there is provided an audio 
and/or video generation apparatus which is arranged in operation to generate audio 
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and/or video signals representative of an audio and/or visual source, the audio and/or 
video generation apparatus comprising a data processor which is arranged in operation 
to detect time codes associated with the audio and/or video signals and to store data 
being representative of the time codes associated with at least part of the audio/video 

5 signals in a data store. 

Storing the time codes associated with the audio/video signals separately in a 

data store provides a facility for addressing the audio/video signals recorded in the 

recording medium separately. As such, in embodiments of the present invention, the 

time code data may representative of the time codes at an in point and an out point of 
10 said at least part of the audio/video signals. Parts of the audio/video signals may, 

therefore be identified from the time code data. 

A further advantage is provided in automatically generating a unique 

identification code to identify the audio/video signals as they are being generated. 

Therefore, the metadata may include a unique identification code for uniquely 
15 identifying part or parts of the audio/video signals. The part or parts may be takes of 

audio/video material. In preferred embodiments the unique identification code may be 

a UMED or the like. 

In a preferred embodiment the audio and/or video generation apparatus may be 
a video camera, camcorder, television camera, cinema camera or the like. 

20 According to an aspect of the present invention there is provided a metadata 

generation tool which is arranged in operation to receive audio and/or video signals 
representative of an audio and/or visual source, and to generate metadata associated 
with the audio and/or video signals, the generation apparatus comprising a data 
processor which is arranged in operation to generate the metadata in response to the 

25 audio and/or video signals and to store the metadata associated with at least part of the 
audio/video signals in a data store, wherein the data processor is arranged in operation 
to detect time codes associated with the audio and/or video signals, the generated 
"metadata" being representative of the t^ 
audio/video signals." " " ~~~ : : ~" 

30 Various further aspects and features of the present invention are defined in the 

appended claims. 
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pt-iof TWri ption of t he Drawings 

Embodiments of the present invention will now be described by way of 
example with reference to the accompanying drawings wherein: 

Figure 1 is a schematic block diagram of a video camera arranged in operative 
association with a Personal Digital Assistant (PDA), 

Figure 2 is a schematic block diagram of parts of the video camera shown in 

figure 1, 

Figure 3 is a pictorial representation providing an example of the term ot the 

PDA shown in figure 1, 

Figure 4 is a schematic block diagram of a further example arrangement ot 
parts of a video camera and some of the parts of the vkleo camera associated with 
generating and processing metadata as a separate acquisition unit associated with a 

further example PDA, 

Figure 5 is a pictorial representation providing an example of the form of the 

15 acquisition unit shown in figure 4, 

Figure 6 is a part schematic part pictorial representation illustrating an example 
' of the connection between the acquisition unit and the video camera of figure 4, 

Figure 7 is a part schematic block diagram of an ingestion processor coupled to 
a network, part flow diagram illustrating the ingestion of metadata and audio/video 

10 material items, 

Figure 8 is a pictorial representation of the ingestion processor shown in figure 

7, 

Figure 9 is a part schematic block diagram part pictorial representation of the 
ingestion processor shown in figures 7 and 8 shown in more detail, 
25 Figure 10 is a schematic block diagram showing the ingestion processor shown 
in operative association with the database of figure 7, 
Figure . 11 -is a schematic block- diagram: ibo^**^'-**^*^. 



operation of the ingestion processor shown figure 7, 

Figure 12a is a schematic representation of the generation of picture stamps at 

30 sample times of audio/video material, 
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Figure 12b is a schematic representation of the generation of text samples with 
respect to time of the audio/video material, 

Figure 13 provides as illustrative representation of an example structure for 

organising metadata, 

5 Figure 14 is a schematic block diagram illustrating the structure of a data 

reduced UMD, and 

Figure 15 is a schematic block diagram illustrating the structure of an extended 

UMID. 

Description of Preferred Embodiments 
1 0 Acquisition Unit 

Embodiments of the present invention relate to audio and/or video generation 
apparatus which may be for example television cameras, video cameras or camcorders. 
An embodiment of the present invention will now be described with reference to figure 
1 which provides a schematic block diagram of a video camera which is arranged to 

15 communicate to a personal digital assistant (PDA). A PDA is an example of a data 
processor which may be arranged in operation to generate metadata in accordance with 
a user's requirements. The term personal digital assistant is known to those acquainted 
with the technical field of consumer electronics as a portable or hand held personal 
organiser or data processor which include an alpha numeric key pad and a hand writing 

20 interface. 

In figure 1 a video camera 101 is shown to comprise a camera body 102 which 
is arranged to receive light from an image source falling within a fieid of view of an 
imaging arrangement 104 which may include one or more imaging lenses (not shown). 
The camera also includes a view finder 106 and an operating control unit 108 from 

25 which a user can control the recording of signals representative of the images formed 
within the field of view of the camera. The camera 101 also includes a microphone 
110 which may be a plurality of microphones arranged to'record sound in stereo: Also 
"' " shownln figure iisahand-held PDA" T 12" wEchfras a screen •n4-andanalphanumeric- 
key pad 116 which also includes a portion to allow the user to write characters 

30 recognised by the PDA. The PDA 1 12 is arranged to be connected to the video camera 
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,0, via an interface US. The interface MS is in accordance : wtrh a 

leaned smndard fornta, sue, as, * sample an *S232 « ^ J- 

Ua „ S ma y a,so be — - - for 

a wireless communications link. The mttrface 118 proves > 

L— . — - - - — 101 ■ Thc tocuon "ot 

the PDA U2 wil, bo explained in more detail short*. However in general e DA 
Tv provrdes a facUity for sending and reiving metadata generated using the 

by te v.deo camera ,. A better landing of the operation of t e * - 
,., h, combination with the PDA , ,2 ma y be gathered from figure 2 wKt^ows 
m ore detaiUd representation of the bod, ,02 of dne video camera which ,s shown 
figure , and in v,hich common par* have the same numerical designauons. . 

ta Bgure 2 the camera hodv ,02 is shown » comprise a tape ^ 
read-write heads ,24 operadve.y associated with a magnetic recordtng tape 1 A. 
shown in figure 2 the camera hod y inCudes a metadata generatton processo 2 
! Z d .o me tape drive ,22 via a connecting chamte, ,30. A.so connected to the 
71 general processor ,2, is a da, store ,32, a Coo, ,36 ano three sensor 
US ,40 ,42. The interface unit ... sends and receives dara a.so shown tn 
t whelesa channe. , 19. Corresponding two connect channels for racing 

.operation processor ,28 via corresponding connecting channel ,48 and 50. The 
^ gLafion processor fa * shown to receive via a connect chann. 
I JL ^a, generated b y the came. The audioMdeo stgurds are aiso fed 

to te tape drive ,22 to he recorded « » *. ^ 

The video camera , ,0 shown in figure , operates to recor 
fcHing within the fie,d of view of dne .ens arrangement ,04 onto a recording medrur. 

aoteoted » «he microphone ,0, and arranged » he recorded as audro stg^ 
30 recordingmediumwi^evideosignaU. As shown in figure 2. the teeo^g -to 



recording tape 126 by the read/write heads 124. The arrangement by which the video 
signals and the audio signals are recorded by the read/write heads 124 onto the 
magnetic tape 126 is not shown in figure 2 and will not be further described as this 
does not provide any greater illustration of the example embodiment of the present 
5 invention. However once a user has captured visual images and recorded these images 
using the magnetic tape 126 as with the accompanying audio signals, metadata 
describing the content of the audio/video signals may be input using the PDA 112. As 
will be explained shortly this metadata can be information that identifies the 
audio/video signals in association with a pre-planned evenL such as a *take\ As 

1 0 shown in figure 2 the interface unit 1 1 8 provides a facility whereby the metadata added 
by the user using the PDA 112 may be received within the camera body 102. Data 
signals may be received via the wireless channel 119 at the interface unit 118. The 
interface unit 1 1 8 serves to convert these signals into a form in which they can be 
processed by the acquisition processor 128 which receives these data signals via the 

15 connecting channels 148, 150. 

Metadata is generated automatically by the metadata generation processor 128 
in association with the audio/video signals which are received via the connecting 
channel 151. In the example embodiment illustrated in figure 2, the metadata 
generation processor 128 operates to generate time codes with reference to the clock 

20 136, and to write these time codes on to the tape 126 in a linear recording track 
provided for this purpose. The time codes are formed by the metadata generation 
processor 128 from the clock 136. Furthermore, the metadata generation processor 
128 forms other metadata automatically such as a UMID, which identifies uniquely the 
audio/video signals. The metadata generation processor may operate in combination 

25 with the tape driver 124, to write the UMID on to the tape with the audio/video signals. 

In an alternative embodiment, the UMID, as well as other metadata may be 
stored in the data store 132 and communicated separately from the tape 126. In this 

T "Case, a tape ID is generated by the metadata generation processor 128-and.writren-on-to.._._ 

the tape 126; to -identify the tape 426-from other-tapes — 

30 In order to generate the UMID, and other metadata identifying the contents of 

the audio/video signals, the metadata generation processor 128 is arranged in operation 
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provide, ,1 meradara 8 — processor wirh — such - t e » 
f te camera lens , 04, * shutter speed and a signal received vra .he control on 

10 '" „ , ... , 40 , 4 , a„d received at the metadata generatton 

are generated by the sensors US, 140, 14. anu .„,,.,„. r. 

• in the example embodiment is 

orocessor 128. The metadata generate processot in tbt P 

usei bv one camera in generating the video signals. Furthermore <he nre^a, 
, Ldoo processor US — One status of one camcorder .... nod . ^ 
^er audro/video signals are being recorded by one rape dr«e a . When 
RECORD START is detected the tN POINT time code is caprored and a UM1D 
vZZ in correspondence whh the IN POtNT time code.. Furthermore to some 
lien, an elded UMnO is general in which case One — J— 
5 processor is arranged to receive spatial co-ordinates which ace -P-~^ 
Lto a, which the audio/video signals are acquired. The spaual co-ordma=s may 
location ai win Positioning 
be generated by a receiver which operates m accordarrce wdh the 
SyL (GPS). The receiver may he e*emal to the camera, or may he embodred 
within the camera body 102. 

, j +v= r»T tt POINT time code is captured hy 
,0 When RECORD START is detected, the OUT POIN 1 time 

■ ww i2g As explained above, it is possible to generate a 
the metadata generation processor US. as expww 

td The "good sho, marker is generated = *. — 

process.aaddetectedbytherrreUdaUgerrerationprocessor.The goodsh 
L either srored on rhe tape, or within the da. store 132, with the corresponding IN 
25 POINT and OUT POINT time codes. 

As already indicated above, the PDA 112 is used to facilitate rdenhficatton of 
* audioMdeo material generated by the earner, Tr, this end^ the PDA 
-r t- ^-dds audio/video mareri, whh pre-planned even, such aa scenes 

The camera and PDA shown in ftgures 1 and 2 form part of an m, gra. d 
- 0 Imfdr planning, acquiring, edidng an audioMdeo production. Durrng a planmng 
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identified. Furthermore for each scene a number of shots are identified which are 
required in order to establish the scene. Within each shot a number of takes may be 
generated and from these takes a selected number may be used lo form the shot for the 
final edit. The planning information in this form is therefore identified at a planning 
5 stage. Data representing or identifying each of the planned scenes and shots is 
therefore loaded into the PDA 1 12 along with notes which will assist the director when 
the audio/video material is captured. An example of such data is shown in the table 
below. 



A/V Production 


News story: BMW disposes of Rover 


Scene ID: 900015689 


Outside Longbridge 


Shot 5000000199 


Longbridge BMW Sign 


Shot 5000000200 


Workers Leaving shift 


Shot 5000000201 


Workers in car park 


Scene ID: 900015690 


BMW HQ Munich 


Shot 5000000202 


Press conference 


Shot 5000000203 


Outside BMW building 


Scene ID: 900015691 


Interview with minister 


Shot 5000000204 


Interview 



10 In the first column of the table below the event which will be captured by the 

camera and for which audio/video material will be generated is shown. Each of the 
events which is defined in a hierarchy is provided with an identification number. 
Correspondingly, in the second column notes axe provided in order to direct or remind 
the director of the content of the planned shot or scene. For example, in the first row 

15 the audio/video production is identified as being a news story, reporting the disposal of 
Rover by BMW. In the extract of the planning information shown in the table below, 
. there are three scenes, each of which is provided with a unique identification number. 
Each of these scenes are "Outside Long Bridge", "BMW HQ Munich" and "Interview • 
with Minister". Correspondingly for each scene a number of shots are identified anH 

20 these are shown below each of the scenes with a unique shot identification number. 
Notes corresponding to the content of each of these shots are also entered in the second 



12 



column. So, for example, for the first scene "Outside Long Bridge", three shots are 
identified which are "Long Bridge BMW", "Workers leaving shift" and "Workers in 
car park". With this information loaded onto the PDA, the director or indeed a single 
camera man may take the PDA out to the place where the new story is to be shot, so 
5 that the planned audio/video material can be gathered. An illustration of the form of 
the PDA with the graphical user interface displaying this information is shown in 
figure 3. 

As indicated in figure 1, the PDA 1 12 is arranged to communicate data to the 
camera 111. To this end the metadata generation processor 128 is arranged to 

10 communicate data with the PDA 1 12 via the interface 1 18. The interface 118 maybe 
for example an infra-red link 119 providing wireless communications in accordance 
with a known standard. The PDA and the parts of the camera associated with 
generating metadata which are shown in figure 2 are shown in more detail in figure 4. 
In figure 4 the parts of the camera which are associated with generating 

15 metadata and communicating with the PDA 112 are shown in a separate acquisition 
unit 152. However it will be appreciated that the acquisition unit 152 could also be 
embodied within the camera 102. The acquisition unit 152 comprises the metadata 
generation processor 128, and the data store 132. The acquisition processor 152 also 
includes the clock 136 and the sensors 138, 140, 142 although for clarity these are .not 

20 shown in figure 4. Alternatively, some or all of these features which are shown in 
figure 2 will be embodied within the camera 102 and the signals which are required to 
define the metadata such as the time codes and the audio/video signals themselves may 
be communicated via a communications link 153 which is coupled to an interface port 
154. The metadata generation processor 128 is therefore provided with access to the 

25 time codes and the audio/video material as well as other parameters used in generating 
the audio/video material. Signals representing the time codes end parameters as well 
as the audio/video signals are received from the interface port 154 via the interface 
• channel 156: The acquisition unit 152 is also provided with a screen (not shown) 
which is driven by a screen driver 158. Also shown in figure 4 the acquisition unit is 

30 provided with a communications processor 160 which is coupled to the metadata 
generation processor 128 via a connecting channel 162. Communications is effected 
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by the communications processor 160 via a radio frequency communications channel 
using the antennae 164. A pictorial representation of the acquisition unit 152 is shown 
in figure 5. 

The PDA 112 is also shown in figure 4. The PDA 1 12 is correspondingly 
provided with an infra-red communications port 165 for communicating data to and 
from the acquisition unit 152 via an infra-red link 1 19. A data processor 1 66 within 
the PDA 1 12 is arranged to communicate data to and from the infra-red port 165 via a 
connecting channel 166. The PDA 112 is also provided with a data store 167 and a 
screen driver 168 which are connected to the data processor 166. 

The pictorial representation of the PDA 112 shown in figure 3 and the 
acquisition unit shown in figure 5 provide an illustration of an example embodiment of 
the present invention. A schematic diagram illustrating the arrangement and 
connection of the PDA 112 and the acquisition unit 152 is shown in figure 6. In the 
example shown in figure 6 the acquisition unit 152 is mounted on the back of a camera 
15 101 and coupled to the camera via a six pin remote connector and to a connecting 
channel conveying the external signal representative of the time code recorded onto the 
recording tape. Thus, the six pin remote connector and the time code indicated as 
arrow lines form the communications channel 153 shown in figure 4. The interface 
port 154 is shown in figure 6 to be a VA to DN1 conversion comprising an RM- 
20 P9/LTC to RS422 converter 154. RM-P9 is a camera remote control protocol, whereas 
LTC is Linear Time Code in the form of an analogue signal. This is arranged to 
communicate with a RS422 to RS232 converter 154" via a connecting channel which 
forms part of the interface port 154. The converter 154" then communicates with the 
metadata generation processor 128 via the connecting channel 156 which operates in 
25 accordance with the RS 232 standard 

Returning to figure 4, the PDA 112 which has been loaded with the pre- 
planned production information is arranged to communicate the current scene and shot 
for which audio/video material is to' be generated by .'communicating the-next shot ID 

~ numBeT via^^ "The pre-planned:information may-also have been - 

communicated to the acquisition unit 152 and stored in the data store 132 via a 
separate link or via the infra-red communication link 119. However in effect the 
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acquisition unit 152 is directed to generate metadata in association with the scene or 
shot ID number which is currently being taken. After receiving the information of the 
current shot the camera 102 is then operated to make a "take of the shot". The 
audio/video material ' of the take is recorded onto the recording tape 126 with 

5 corresponding time codes. These time codes are received along with the audio/video 
material via the interface port 154 at the metadata generation processor 128. The 
metadata generation processor 128 having been informed of the current pre-planned 
shot now being taken logs the time codes for each take of the shot. The metadata 
generation processor therefore logs the IN and OUT time codes of each take and stores 

1 0 these in the data store 1 32. 

The information generated and logged by the metadata generation processor 
128 is shown in the table below. In the first column the scene and shot are identified 
with the corresponding ID numbers, and for each shot several takes are made by the 
camera operator which are indicated in a hierarchical fashion. Thus, having received 

15 information from the PDA 112 of the current shot, each take made by the camera 
operator is logged by the metadata generation processor 128 and the IN and OUT 
points for this take are shown in the second and third columns and stored in the data 
store 132. This information may also be displayed on the screen of the acquisition unit 
152 as shown in figure 5. Furthermore, the metadata- generation processor 128 as 

20 already explained generates the UMED for each take for the audio/video material 
generated during the take. The UMID for each take forms the fourth column of the 
table. Additionally, in some embodiments, to provide a unique identification of the 
tape once which the material is recorded, a tape identification is generated and 
associated with the metadata. The tape identification may be written on to the tape, or 

25 stored on a random access memory chip which is embodied within the video tape 
cassette body. This random access memory chip is known as a TELEFUJE (RTM) 
system which provides a facility for reading the tape ID number remotely. The tape ID _ 
' is written onto the magnetic tape 126 to uniquely identify this tape. In preferred 
embodiments the TELEFELE (RTM) system is provided with a unique number which 

30 manufactured as part of the memory and so can be used as the tape ID number. In 
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other embodiments the TELEFILE (RTM) system provides automatically -the IN/OUT 
time codes of the recorded audio/video material items. 

In one embodiment the information shown in the table below is arranged to be 
recorded onto the magnetic tape in a separate recording channel. However, in other 
5 embodiments the metadata shown in the table is communicated separately from the 
tape 126 using either the communications processor 1 60 or the infra-red link 1 19. The 
metadata maybe received by the PDA 112 for analysis and may be further 
communicated by the PDA. 



Scene ID: 9000)5689 


Tape ID: 00001 




UMID: 


Shot 5000000199 








Take 1 


IN: 00:03:45:29 


OUT: 00:04:21:05 


060C23B34O. 


Take 2 


IN: 00:04:21:20 


OUT: 00:04:28:15 


060C23B340.. 


Take 3 


IN: 00:04:28:20 


OUT: 00:05:44:05 


060C23B340.. 


Shot 5000000200 








Takel 


IN: 00:05:44:10 


OUT: 00:08:22:05 


060C23B340.. 


Take 2 


IN: 00:08:22:10 


OUT: 00:08:23:05 


060C23B340.. • 











10 The communications processor 160 may be arranged in operation to transmit 

the metadata generated by the metadata generation processor 128 via a wireless 
communications link. The metadata maybe received via the wireless communications 
link by a remotely located studio which can then acquire the metadata and process this 
metadata ahead of the audio/video material recorded onto the magnetic tape 126. This 

15 provides an advantage in improving the rate at which the audio/video production may 
be generated during the post production phase in which the material is edited, 

A further advantageous feature provided by embodiments of the present 
invention is an arrangement in which a picture stamp is generated at certain temporal 
" positions within the recorded audio/video signals. A picture stamp is" known "to those ~ 

20 skilled in the art as~ being a digital representor^ present": 
example embodiment is generated from the moving video material generated by the 
camera. The picture stamp may be of lower quality in order to reduce an amount of 
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data required to represent the image from the video signals. Therefore the picture 
stamp may be compression encoded which may result in a reduction in quality. 
However a picture stamp provides a visual indication of the content of the audio/video 
material and therefore is a valuable item of metadata. Thus, the picture stamp may for 

5 example be generated at the fN and OUT time codes of a particular take. Thus, the 
picture stamps may be associated with the metadata generated by the metadata 
generation processor 128 and stored in the data store 132. The picture stamps are 
therefore associated with items of metadata such as, for example, the time codes which 
identify the place on the tape where the image represented by the picture stamp is 

10 recorded. The picture stamps may be generated with the "Good Shot" markers. The 
picture stamps are generated by the metadata generation processor 128 from the 
audio/video signals received via the communications link 153. The metadata 
generation processor therefore operates to effect a data sampling and compression 
encoding process in order to produce the picture stamps. Once the picture stamps have 

1 5 been generated they can be used for several purposes. They may be stored in a data file 
and communicated separately from the tape 126, or they may be stored on the tape 126 
in compressed form in a separate recording channel. Alternatively in preferred 
embodiments picture stamps may be communicated using the communications 
processor 160 to the remotely located studio where a producer may analysis the picture 

20. stamps. This provides the producer with an indication as to whether the audio/video 
material generated by the camera operator is in accordance with what is required. 

In a yet further embodiment, the picture stamps are communicated to the PDA 
1 12 and displayed on the PDA screen. This may be effected via the infra-red port 1 1 9 
or the PDA may be provided with a further wireless link which can communicate with 

25 the communications processor 160. In this way a director having the hand held PDA 
112 is provided with an indication of the current audio/video content generated by the 
camera. This provides an immediate indication of the artist and aesthetic quality of the 

~~ ""audio/video material 1 currently being generated " As" already explained "the picture 
stamps are compression encoded so that they may be rapidly communicated to the 

30 PDA. 



A further advantage of the acquisition unit 152 shown in Figure 4 is that the 
editing process is made more efficient by providing the editor at a remotely located 
studio with an indication of the content of the audio/video material in advance of 
receiving that material. This is because the picture stamps are communication with the 
metadata via a wireless link so that the editor is provided with an indication of the 
content of the audio/video material in advance of receiving the audio/video material 
itself. In this way the bandwidth of the audio/video material can remain high with a 
correspondingly high quality whilst the metadata and picture stamps are at a relatively 
low band width providing relatively low quality information. As a result of the low 
band width the metadata and picture stamps may be communicated via a wireless link 
on a considerably lower band width channel. This facilitates rapid communication of 
the metadata describing content of the audio/video material. 

The picture stamps generated by the metadata generation processor 128 can be 
at any point during the recorded audio/video material. In one embodiment the picture 
stamps are generated at the IN and OUT points of each take. However in other 
embodiments of the present invention as an activity processor 170 is arranged to detect 
relative activity within the video material. This is effected by performing a process in 
which a histogram of the colour components of the images represented by the video 
signal is compiled and the rate of change of the colour components determined and 
changes in these colour components used to indicate activity within the image. 
Alternatively or in addition, motion vectors within the image are used to indicate 
activity. The activity processor 176 then operates to generate a signal indicative of the 
relative activity within the video material. The metadata generation processor 128 then 
operates in response to the activity signal to generate picture stamps such more picture 
stamps are generated for greater activity within the images represented by the video 
signals. 

In an alternative embodiment of the present invention the activity processor 
"170 is arranged to receive the audio signals via the connecting channel 172 and to . 
Recognise sp^^ 

content data representative of the content of this speech as text. The text data is then 
communicated to the data processor 128 which may be stored in the data store 132 or 



18 



communicated with other metadata via the communications processor 160 in a similar 
way to that already explained for the pi cture stamps. 

Finure 7 provides a schematic representation of a post production process in 
which the audio/video material is edited to produce an audio/video program. As 
shown in figure 7 the metadata, which may include picture stamps and/or the speech 
content information is communicated from the acquisition unit 1 52 via a separate route 
represented by a broken line 174, to a metadata database 176. The route 174 may be 
representative of a wireless communications link formed by for example UMTS, GSM 
or the like. 

The database 176 stores metadata to be associated with the audio/v.deo 
material. The audio/video material in high quality form is recorded onto the tape 126. 
Thus the tape 126 is transported back to the editing suite where it is ingested by an 
ingestion processor 178. The tape identification (tape ID) recorded onto the tape 126 
or other metadata providing an indication of the content of the audio/video material .s 
used to associate the metadata stored in the data store 176 with the audio/video 
material on the tape as indicated by the broken line 180. 

As will be appreciated although the example embodiment of the present 
invention uses a video tape as the recording medium for storing the audio/video 
signals, it will be understood that alternative recording medium such as magnetic disks 
and random access memories may also be used. 
Ingestion Processor 

Figure 7 provides a schematic representation of a post production process in 
which the audio/video material is edited to produce an audio/video program. As 
shown in figure 7 the metadata, which may include picture stamps and/or the speech, 
content information is communicated from the acquisition unit 152 via a separate route 
represented by a broken line 174, to a metadata database 176. The route 174 may be 

r^resent^ve.^ 
or the like. 

The database 176 stores metadata to be associated with the audio/video 
material. The audio/video material in high quality form is recorded onto the tape 126. 
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Thus the tape 126 is transported back to the editing suite where it is ingested by an 
ingestion processor 178. The tape identification (tape ID) recorded onto the tape 126 
or other metadata providing an indication of the content of the audio/video material is 
used to associate the metadata stored in the data store 176 with the audio/video 

5 material on the tape as indicated by the broken line 1 80. 

The ingestion processor 178 is also shown in Figure 7 to be connected to a 
network formed from a communications channel represented by a connecting line 182. 
The connecting line 1 82 represents a communications channel for communicating data 
to items of equipment, which form an inter-connected network. To this end, these 

1 0 items of equipment are provided with a network card which may operate in accordance 
with a known access technique such as Ethernet, RS422 and the like. Furthermore, as 
Will be explained shortly, the communications network 1 82 may also provide data 
communications in accordance with the Serial Digital Interface (SDI) or the Serial 
Digital Transport Interface (SDTI). 

15 Also shown connected to the communications network 182 is the metadata 

database 176, and an audio/video server 190, into which the audio/video material is 
ingested. Furthermore, editing terminals 184, 186 are also connected to the 
communications channel 182 along with a digital multi-effects processor 188. 

The communications network 182 provides access to the audio/video material 

20 present on tapes, discs or other recording media which are loaded into the ingestion 
processor 178. 

The metadata database 176 is arranged to receive metadata via the route 174 
describing the content of the audio/video material recorded on to the recording media 
loaded into the ingestion processor 178. 
25 As will be appreciated although in the example embodiment a video tape has 

been used as the recording medium for storing the audio/video signals, it will be 
understood that alternative recording media such as magnetic disks and random access 
' " memories may also be used, and that video tapeis provided as an illustrative example 

"only" ~~r~ : . :~ : "' ~." '.' : : "' 

30 The editing terminals 184, 186 digital multi-effects processor 188 are provided 

with access to the audio/video material recorded on to the tapes loaded into the 
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i^ion processor 178 and the — describing ,his audio/video ma.enul stored ,„ 
I metal dau.se ,7, via * can--**- * ^ 

ingesl ion processor with .7. in combination with .he n,erada<a database ,76 w,„ no. 

be described in more detail. 

Figure 8 provides an esampta representation of .he ingesnon processor 78. in 
Flgu re 8 the ingestion processor .78 is shown .0 have a Jog sh.de con.ro, 200 for 
navigating throngh the audio/video oratorio, recorded on the topes loaded m* v doo 
^ ^reproducers forcing part of the ingestion processor ,78. The ,n g est,on 
pLssor ,78 a,so inCudes a disp,a y screen 202 which is arranged ,0 dlsptay p.ctum 
Lnrps which describe sCeccd parts of .he audio/video materia,. The d,sp,av sere n 
Talso acts as a tooob screen prov.drng a aser with d* *** for setacung * 
audio/video matenal by touch. The ingestion processor ,78 is a,so arranged to d,sptay 
a„ was of metadata on the screen 202 which includes script, camera type, ,ens types 

and UMIDs. ■■ .. f 

As shown in Figure 9, the ingestion processor ,78 may ,nc,nde a plumhty of 

video tape recorders/reproducers into which the v,dco tapes 0*0 which tine 
^oMdeo materia, is recorded may be ,oaded in paralta,. In me e*amp,e show, m 
figure 9, me video tape orders 204 axe connected ,0 the ingestion processor, ^ 
» RS422 link and an SDI IN/OUT link. The ingestion processor ,7 tiWo 
.presents a dam processor which can access any of me video tape recorders 204 n 
JL to reprodnce the andioMdeo materia, from the video mpes loaded m.o to v,de 
Ipe JL Furthermore, me ingestion processor ,78 is provided with a network 
L ta order tn access me communications network ,82. As will be apprecated from 
Figure 9 however, to cornmumcationa chatme, ,82 is comprised of a rotative y low 
b and widm data communications channo, ,82' and a high band width SDI channe, 
18r for use in — g v,deo data. Corresponding,,, therefore to ingestion 
pressor ,78 is connected .0 the video tape recorders **-J^J^£_ 
:: LnWdcate rec.uests for corresponding items of audioMdeo mate™,. Havmg 
^ tose items of audio/video material, .he- audio/video matenal 
_ica,ed back to tine ingestion processor 178 via an SDI — canon tank 
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for distribution via the SDI network. The requests may for example include the UMID 
which uniquely identifies the audio/video material item(s). 

The operation of the ingestion processor in association with the metadata 
database 176 will now be explained with reference to figure 10. In figure 10 the 
5 metadata database )76 is shown to include a number of items of metadata 210 
associated with a particular tape ID 212. As shown by the broken line headed arrow 
214, the tape ID 212 identifies a particular video tape 216. on which the audio/video 
material corresponding to the metadata 210 is recorded. In the example embodiment 
■ shown in Figure 10, the tape ID 212 is written onto the video tape 218 in the linear 
10 time code area 220. However it will be appreciated that in other embodiments, the 
tape ID could be written in other places such as the vertical blanking portion. The 
video tape 216 is loaded into one of the video tape recorders 204 forming part of the 
ingestion processor 178. 

In operation one of the editing terminals 184 is arranged to access the metadata 
15 database 176 via the low band width communications channel 182' the editing terminal 
184 is therefore provided with access to the metadata 210 describing the content of the 
audio/video material recorded onto the tape 216, The metadata 210 may include such 
as the copyright owner "BSkyB", the resolution of the picture and the format in which 
the video material is encoded, the name of the program, which is in this -case 
20 "Grandstand", and information such as the date, time and audience. Metadata may 
further include a note of the content of the audio/video material. 

Each of the items of audio/video material is associated with a UMID, which 
idenifies the audio/video material. As such, the editing terminal 184 can be used to 
identify and select from the metadata 210 the items of audio/video material which are 
25 required in order to produce a program. This material may be identified by the UMID 
associated with the material. In order to access the audio/video material to produce the 
program, the editing terminal 184 communicates a request for this material via the low 
-—--.-». band. ; width ^communications network -182.-- The request- includes the- UMID. or. the.. 

-UMIDs identifying the audio/video material item(s). In -response to -the-request for- 

30 audio/video material received from the editing terminal 184, the ingestion processor 
178 is arranged to reproduce selectively these audio/video material items identified by 
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loaaed . This audio/video ^ is then via the SD, nelwora .8 ba^ 

(h£ =dilins ltminal , M ,0 be — d into the audioMdeo pr— - 
ediK d. The streamed audio/video materia, is ingeeted ia«o the aud.oMdeo server . 
from where the audio/video can be stored and reproduced. 

Fiwre „ provides an altetnative arrangement in which the metadata 2,0 

te rn-tada.2,0 could be recorded in one of the audio tracks of the vtdeo «« - 8 . 

ra „dom access and providutg a Renter capacity for stonng data. In thus 
— 210 ma, be stored widtthe audio/video ^ 

In a yet further arrangement, some or all of the metadata my 
the tape 2. 6 This may be recorded, for example, into the iinear record «* of the 
Some melata related to the metadata recorded onto the tape may * 
leyed separately and stated in the debase ,76. A further step . — «^ 
Lgl ^ . «s end *. ingesbon processor 178 is arranged to tea 

nom the tecordtng medium „r and convoy - — ™ * 
communions network ,8, ,0 the — database ,76. « 

^ 178 mav be ingested nto the database wo w * =• p 
bv the ingestion processor 178 may oe mgc^ 

litan or via dre reoording medium on tvnich tbe audioMdeo matenal , 

""^ metadala associasd wi* una arrdio/video material may also Include plenum 

SBmps wM oh represent low — — - - ~ " ~ ^ 

Couture video material. These may be presented at 4. touoh screen 202 to 

rinpro^rns. «— Iraae plodrra seam, may he 

• i ic4 186 or the effects processor 188 to provioe_ 
network 182' to the editing terminals 184, 186 or tne eu y . . - 

- rSadon o f si — — * 

pro vided widt a pictorial rcpreaentarion for fte audloMdeo matarra, and from <h* 
0 ILon of an audloMdeo material items may be -~ 

^ may stored in the ^catabasa 176 as pa, of *a m«da,a 21C, The edtto 



therefore retreive a selected item for the corresponding picture stamp using the UMID 
which is associated with the picture stamp. 

In other embodiments of the invention, the recording medium may not have 
sufficient capacity to include picture stamps recorded with the audio/video material. 
5 This is likely to be so if the recording medium is a video tape 216. It is particularly 
appropriate in this case, although not exclusively so, to generate picture stamps before 
or during ingestion of the audio/video material. 

Returning to figure 7, in other embodiments, the ingestion processor 178 may 
include a pre-processing unit. The pre-processing unit embodied within the ingestion 

10 processor 178 is arranged to receive the audio/video material recorded onto the 
recording medium which, in the present example is a video tape 126. To this end, the 
pre-processing unit may be provided with a separate video recorder/reproducer or may 
be combined with the video tape recorder/reproducer which forms part of the ingestion 
processor 178. The pre-processing unit generates picture stamps associated with the 

15 audio/video material. As explained above, the picture stamps, are used to provide a 
pictorial representation of the content of the audio/video material items. However In 
accordance with a further embodiment of the present invention the pre-processing unit 
operates to process the audio/video material and generate an activity indicator 
representative of relative activity within the content of the audio/video material. This 

20 may be achieved for example using a processor which operates to generate an activity, 
signal in accordance with a histogram of colour components within the images 
represented by the video signal and to generate the activity signals in accordance with a 
rate of change of the colour histogram components. The pre-processing unit then 
operates to generate a picture stamp at points throughout the video material where 

25 there are periods of activity indicated by the activity signal. This is represented in 
Figure 12. In Figure 12A picture stamps 224 are shown to be generated along a line 
226 which is representing time within the video signal. As shown in figure 12A the 
":~ : " : picture stamps 224 are generated- at times along the time line 226 where- the -activity . 
signal- represented -as- -arrows- -2-28- indicates -events -of-acti-vity,— This might...be for. 

30 example someone walking into and out of the field of view of the camera where there 
is a great deal of motion represented by the video signal. To this end, the activity 
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signal may also be generated using motion vectors which may be. for example, the 
motion vectors generated in accordance with the MPEG standard. 

In other embodiments of the invention, the pre-processor may generate textual 
information corresponding to speech present within the audio signal forming part of 
5 the audio/video material items stored on the tape 126. The textual information may be 
generated instead of the picture stamps or in addition to the picture stamps. In this 
case, text may be generated for example for the first words of sentences and/or the first 
activity of a speaker. This is detected from the audio signals present on the tape 
recording or forming part of the audio/video material. The start points where text is to 
10 be generated is represented along the time line 226 as arrows 230. Alternatively the 
text could be generated at the end of sentences or indeed at other points of interest 
within the speech. 

At the detected start of the speech, a speech processor operates to generate a 
textual representation of. the content of the speech. To this end, the time line 226 

15 shown in Figure 12B is shown to include the text 232 corresponding to the content of 
the speech at the start of activity periods of speech. 

The picture stamps and textual representation of the speech activity generated 
by the pre-processor is communicated via the communications channel 182 to the 
metadata database 176 and stored. The picture stamps and text are stored in 

20 association with the UMED identifying the corresponding items of audio/video material 
from which the picture stamps 224 and the textual information 232 were generated. 
This therefore provides a facility to an editor operating one of the editing terminals 
184, 186 to analyse the content of the audio/video material before it is ingested using 
the ingestion processor 178. As such the video tape 126 is loaded into the ingestion 

25 processor 1 78 and thereafter the audio/video material can be accessed via the network 
communications channel 182. The editor is therefore provided with an indication, very 
rapidly, of the content of the audio/video material and so may ingest ordy mose parts of 
: ~~~~~~' :: ±t material which are relevajit to "the particular material items required by the editor. 
This has a particular advantage in improving the efficiency with which the editor may 

30 produce an audio/video production. 
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In an alternative embodiment, the pre-processor may be a separate unit and may 
be provided with a screen on which the picture stamps and/or text information are 
displayed, and a means such as, for example a touch screen, to provide a facility for 
selecting the audio/video material items to be ingested. 

In a further embodiment of the invention, the ingestion processor 1 78 generates 
metadata items such as UMIDs whilst the audio/video material is being ingested. This 
may required because the acquisition unit in the camera 152 is not arranged to generate 
UMIDs, but does generate a Unique Material Reference Number (MURN). The 
MURM is generated for each material item, such as a take. The MURN is arranged to 
be considerably shorter than a UMID and can therefore be accommodated within the 
linear time code of a video tape, which is more difficult for UMIDs because these are 
larger. Alternatively the MURN may be written into a TELEFILE (RTM) label of the 
tape. The MURN provides a unique identification of the audio/video material items 
present on the tape. The MURNs may be communicated separately to the database 1 76 

as indicated by the line 1 74. 

At the ingestion processor 178, the MURN for the material items are recovered 
from the tape or the TELEFILE label. For each MURK, the ingestion processor 178 
.operates to generate a UMID corresponding to the MURN. The UMIDs are then 
communicated with the MURN to the database 176, and are ingested into the database 
in association with the MURNs, which may be already present within the database 176. 
Camera Metadata 

The following is provided, by way of example, to illustrate the possible types of 
metadata generated during the production of a programme, and one possible 
organisational approach to structuring that metadata. 

Figure 13 illustrates an example structure for organising metadata. A number 
of tables each comprising a number of fields containing metadata are provided. The 
... tables . may be associated with each other by_ way of common fields within the 
respective tables, thereby providing a relational structure. " Also, the structure may 



comprise a number of instances of the same table to represent multiple instances of the 
object that the table may represent. The fields may be formatted in a predetermined 
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manner. The size of the fields may also be predetermined. Example ri» include 
- ln ,» which represents 2 byes, "Long In." whieh represents 4 bytes and "Double" 
which represents 8 bytes. Alternatively, the size of the fields may be defined w,,h 
reference ,0 the number of characters to be held within the field such as, for sample. 
5 8, 10, 16, 32. 128, and 255 characters. 

Turning to the structure in more detail, there is provided a Programme Table. 
The Proaramme Table comprises a number of fields including Programme ID (PID), 
Tide Working Title, Genre ID, Synopsis, Aspec, Ratio, Director ID and Picturestamp. 
associated with the Programme Table is a Oeore Table, a Keywords Table, a Scnp. 
, 0 Table, a People Table, a Schedule Table and a plurality of Media Object Tables. 

The Genre Table comprises a number of fields including Geare ID, whtch ,s 
associated with the Genre ID field of the Programme Table, and Genre Description. 

The Keywords Table comprises a number of fields including Programme ID, 
which is associated with the Programme ID field of .he Programme Table, Keyword ID 

15 and Keyword. . 

The Script Table comprises a number of fields including Script ID, Script 
Name, Script Type, Document Format, Path, Creation Date, Original Author, Vers.on, 
Last Modified, Modified By, PID associated with Programme ID and Notes. The 
People Table comprises a number of fields including Image. 

The People Table is associated with a number of Individual Tables and a 
number of Group Tables. Each Individual Table comprises a number of fields 
including Image. Each Group Table comprises a number of fields including Image. 
Each Individual Table is associated with either a Production Staff Table or a Cast 

Table, . 
, 5 The Production Staff Table comprises a number of fields including Production 

Staff ID Surname, Firstname, Contract ID, Agent, Agency ID, E-mail, Address, Phone 
Number, Role ID, Notes, Allergies, DOB, National Insurance Number and Bank ^ . 

and Picture Stamp. 

The Cast Table comprises a number of fields including Cast ID,. Surname, 
30 Firstname, Character Name, Contract ID, Agent, Agency ID, Equity Number, E-mail, 
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Address. Phone Number. DOB and Bank TD and Picture Stamp. Associated with the 
Production Staff Table and Cast Table are a Bank Details Table and an Agency Table. 

The Bank Details Table comprises a number of fields including Bank ID. 
which is associated with the Bank ID field of the Production Staff Table and the Bank 
ID field of the Cast Table, Sort Code, Account Number and Account Name. 

The Agency Table comprises a number of fields including Agency ID, which is 
associated with the Agency ID field of the Production Staff Table and the Agency ID 
field of the Cast Table, Name, Address, Phone Number, Web Site and E-mail and a 
Picture Stamp. Also associated with the Production Staff Table is a Role Table. 

The Role Table comprises a number of fields including Role ID, which is 
associated with the Role ID field of the Production Staff Table, Function and Notes 
and a Picture Stamp. Each Group Table is associated with an Organisation Table. 

The Organisation Table comprises a number fields including Organisation ID, 
Name, Type, Address, Contract ID, Contact Name, Contact Phone Number and Web 

1 5 Site and a Picture Stamp. 

Each Media Object Table comprises a number of fields including Media Object 
ID, Name, Description, Picturestamp, PID, Format, schedule ID, script ID and Master 
ID. Associated with each Media Object Table is the People Tabic, a Master Table, a 
Schedule Table, a Storyboard Table, a script table and a number of Shot Tables. 
20 The Master Table comprises a number of fields including Master ID, which is 

associated with the Master ID field of the Media Object Table, Title, Basic UMTD, 
EDL ID, Tape ID and Duration and a Picture Stamp. 

The Schedule Table comprises a number of fields including Schedule ID, 
Schedule Name, Document Format, Path, Creation Date, Original Author, Start Date, 
25 End Date, Version, Last Modified, Modified By and Notes and PID which is 
associated with the programme ID. 

The contract table contains: a contract ID which is associated with the contract 
~JD of me Production staff, cast, and organisation tables; commencement date, rate, job 

_ — --^e^pt^ ditg^iid-de»ilsr — _ „ . . ~: ~ ~~~ 
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The Storyboard Table comprises a number of fields including Storyboard ID, 
which is associated with the Storyboard ID of the shot Table, Description, Author, Path 
and Media ID. 

Each Shot Table comprises a number of fields including Shot ID, PID, Media 
ID, Title, Location ID, Notes, Picturestamp, script ID, schedule ID, and description. 
Associated with each Shot Table is the People Table, the Schedule Table, script table, 
a Location Table and a number of Take Tables. 

The Location Table comprises a number of fields including Location ID, which 
is associated with the Location ID field of the Shot Table, GPS. Address, Description, 
Name, Cost Per Hour, Directions, Contact Name, Contact Address and Contact Phone 

Number and a Picture Stamp. 

Each Take Table comprises a number of fields including Basic UMID. Take 
Number, Shot ID, Media ID, Timecode IN, Timecode OUT, Sign Metadata, Tape ID, 
Camera ID, Head Hours, Videographer, IN Stamp, OUT Stamp. Lens ID, AUTOID 
ingest ID and Notes. Associated with each Take Table is a Tape Table, a Task Table., a 
dera Table, a lens table, an ingest table and a number of Take Annotation Tables. 

The Ingest table contains an Ingest ID which is associated with the Ingest Id in 

the take table and a description. 

The Tape Table comprises a number of fields including Tape ID, which is 
associated with the Tape ID field of the Take Table, PID, Format, Max Duration, First 
Usage, Max Erasures, Current Erasure, ETA ( estimated time of arrival) and Last 
Erasure Date and a Picture Stamp. 

The Task Table comprises a number of fields including Task ID, PID, Media 
ID, Shot ID, which are associated with the Media ID and Shot ID fields respectively of 
the Take Table, Title, Task Notes, Distribution List and CC List. Associated with the 
Task Table is a Planned Shot Table. 

The Planned Shot Table comprises a number of fields including Planned Shot 
" - " " ^ the PID, Media ID and Shot ID 

respectively of the Task Table, Director, Shot Title, Location, Notes, Description, 
30 Videographer, Due date, Programme title, media title Aspect Ratio and Format. 
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The Camera Table comprises a number of fields including Camera ID, which is 
associated with the Camera ID field of the Take Table. Manufacturer, Model, Format, 
Serial Number, Head Hours, Lens ID, Notes, Contact Name, Contact Address and 
Contact Phone Number and a Picture Stamp. 
5 The Lens Table comprises a number of fields including Lens ID, which is 

associated with the Lens ID field of the Take Table, Manufacturer, Model, Serial 
Number, Contact Name, Contact Address and Contact Phone Number and a Picture 
Stamp. 

Each Take Annotation Table comprises a number of fields including Take 
10 Annotation ID, Basic UMID, Timecode, Shutter Speed, Iris, Zoom, Gamma, Shot 
Marker ID, Filter Wheel, Detail and Gain. Associated with each Take Annotation 
Table is a Shot Marker Table. 

The Shot Marker Table comprises a number of fields including Shot Marker 
ID, which is associated with the Shot Marker ID of the Take Annotation Table, and 

15 Description. 

TIMID Description 

A UMID is described in SMPTE Journal March 2000 which provides details of 
the UMID standard. Referring to figures 14 and 15, a basic and an extended UMID are 
shown. It comprises a first set of 32 bytes of basic UMID and a second set of 32 bytes 

20 of signature metadata. 

The first set of 32 bytes is the basic UMID. The components are: 
•A 12-byte Universal Label to identify this as a SMPTE UMID. It defines the 
type of material which the UMID identifies and also defines the methods by which the 
globally unique Materia! and locally unique Instance numbers are created. 
25 -A 1-byte length value to define the length of the remaining part of the UMID. 

•A 3-byte Instance number which is used to distinguish between different 

'instancesL of material with the same Material number. 

.A 16-byte Material number whic h is used to identify ^ each chp._gach Material 
number is the same for related instances of the same material. 
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The second set of 32 bytes of the signature metadata as a set of packed 
metadata items used to create an extended UMID. The extended UMID comprises the 
basic UMID followed immediately by signature metadata which comprises: 

•An 8-byte time/date code identifying the time and date of the Content Unit 
creation. 

•A 12-byte value which defines the spatial co-ordinates at the time of Content 
Unit creation. 

.3 groups of 4-byte codes which register the country, organisation and user 

codes 

Each component of the basic and extended UMIDs will now be defined in turn. 
The 12-byte Universal Label 

The first 12 bytes of the UMID provide identification of the UMID by the 



Byte No. 


Description 


Value (hex) 


1 


Object Identifier 


06h 


_ — — 


™~ Label size 


OCb 




Designation: ISO 


2Bh 


4 


Designation: SMPTE 


34h 


5 


Registry: Dictionaries 


Olh 


6 


Registry: Metadata Dictionaries 


Olh 


— ? 


Standard: Dictionary Number 


Olh 


8 


Version number 


Olh 


9 


Class: Identification and location 


Olh 


10 


Sub-class: Globally Unique Identifiers 


Olh 


11 


Type: UMID (Picture, Audio, Data, Group) 


01,02, 03,04h 


12 


Type: Number creation method 


XXh 
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table. Referring to the Table 1, in the example shown byte 4 indicates that bytes 5-12 
relate to a data format agreed by SMPTE. Byte 5 indicates that bytes 6 to 10 relate to 
"dictionary" data. Byte 6 indicates that such data is "metadata" defined by bytes 7 to 
10. Byte 7 indicates the part of the dictionary containing metadata defined by bytes 9 
5 and 10. Byte 10 indicates the version of the dictionary. Byte 9 indicates the class of 
data and Byte 10 indicates a particular item in the class. 

In the present embodiment bytes 1 to 10 have fixed pre-assigned values. Byte 
1 1 is variable. Thus referring to Figure 15, and to Table I above, it will be noted that 
the bytes 1 to 10 of the label of the UMID are fixed. Therefore they may be replaced 
10 by a 1 byte 'Type' code T representing the bytes 1 to 10. The type code T is followed 
by a length code L. That is followed by 2 bytes, one of which is byte 1 1 of Table 1 and 
the other of which is byte 12 of Table 1, an instance number (3 bytes) and a material 
number (16 bytes). Optionally the material number may be followed by the signature 
metadata of the extended UMID and/or other metadata. 
15 The UMID type (byte 1 1) has 4 separate values to identify each of 4 different 

data types as follows: 

'01h' = UMID for Picture material 

'02h' = UMID for Audio material 

l 03h' = UMID for Data material 
20 * 04h' = UMID for Group material (i.e. a combination of related essence) 

The last (12th) byte of the 12 byte label identifies the methods by which the 
material and instance numbers are created. This byte is divided into top and bottom 
nibbles where the top nibble defines the method of Material number creation and the 
bottom nibble defines the method of Instance number creation. 

25 Length 

The Length is a 1 -byte number with the value ' 13h' for basic UMIDs and '33h' 

for extended UMIDs. 

T "'. : :. "" Instance Number;. • - - ----- -— »- ••■ - : ■ — - : -• 

The Instance number is a unique -3 -byte -number- which is .created, by -One ol 

30 several means defined by the standard. It provides the link between a particular 
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'instance' of a clip and externally associated metadata. Without this instance number, 
all material could be linked to any instance of the material and its associated metadata. 

The creation of a new clip requires the creation of a new Material number 
together with a zero Instance number. Therefore, a non-zero Instance number indicates 
that the associated clip is not the source material. An Instance number is pnmanly 
used to identify associated metadatarelated to any particular instance of a clip. 

Material Number 

The 16-byte Material number is a non-zero number created by one of several 
means identified in the standard. The number is dependent on a 6- byte registered port 
ID number, time and a random number generator. 

Signature Metadata 

Any component from the signature metadata may be null-filled where no 
meaningful value can be entered. Any null-filled component is wholly null-filled to 
clearly indicate a downstream decoder that the component is not valid. 

The Time-Date Format 

The date-time format is 8 bytes where the first 4 bytes are a UTC (Universal 
Time Code) based time component. The time is defined either by an AES3 32-bit 
audio sample clock or SMPTE 12M depending on the essence type. 

The second 4 bytes define the date based on the Modified Julian Data (MJD) as 
defined in SMPTE 309M. This counts up to 999,999 days after midnight on the 17th 
November 1 858 and allows dates to the year 4597. 

The Spatial Co-ordinate Format 

The spatial co-ordinate value consists of three components defined as follows: 
•Altitude: 8 decimal numbers specifying up to 99,999,999 metres. 
.Longitude: 8 decimal numbers specifying East/West 180.00000 degrees (5 

decimal places active). 

.Latitude: 8 decimal numbers specifying North/South 90.00000 degrees (5 

decimal places active). 

The Altitude value is expressed as a value in metres from the centre of the earth 

thus allowing altitudes below the sea level. 



33 



It should be noted that although spatial co-ordinates are static for most clips, 
this is not true for all cases. Material captured from a moving source such as a camera 
mounted on a vehicle may show changing spatial co-ordinate values. 

Country Code 

The Country code is an abbreviated 4-byte alpha-numeric string according to 
the set defined in ISO 3166. Countries which are not registered can obtain a registered 
alpha-numeric string from the SMPTE Registration Authority. 

Organisation Code 

The Organisation code is an abbreviated 4-byte alpha-numeric string registered 
with SMPTE. Organisation codes have meaning only in relation to their registered 
Country code so that Organisation codes can have the same value in different 
countries. 

User Code 

The User code is a 4-byte alpha-numeric string assigned locally by each 
organisation and is not globally registered. User codes are defined in relation to their 
registered Organisation and Country codes so that User codes may have the same value 
in different organisations and countries. 

Freelance Operators 

Freelance operators may use their country of domicile for the country code and 
use the Organisation and User codes concatenated to e.g. an 8 byte code which can be 
registered with SMPTE. These freelance codes may start with the symbol ( ISO 
8859 character number 7Eh) and followed by a registered 7 digit alphanumeric string. 

As will be appreciated by those skilled in the art various modifications may be 
made to the embodiments herein before described without departing from the scope of 
the present invention. For example whilst embodiments have been described with 
recording audio/video onto magnetic tape, it will be appreciated that other recording 
media are possible. Furthermore although the user generated metadata has been 
represented as text information, it will be appreciated- that any other forms of metadata 
may-be' generated eimerautoraatically-or under- control of the user^nd-received ..within . 
the audio and/or video generation apparatus via an interface unit. Correspondingly the 
secondary metadata may be any form of semantic or syntactic metadata. 
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As will be appreciated those features of the invention which appear in the 
example embodiments as a data processor or processing units could be implemented in 
hard ware as well as a software computer program running on an appropriate data 
processor. Correspondingly those aspects and features of the invention which are 
described as computer or application programs running on a data processor may be 
implemented as dedicated hardware. It will therefore be appreciated that a computer 
program running on a data processor which serves to form an audio and/or video 
generation apparatus as herein before described is an aspect of the present invention. 
Similarly a computer program recorded onto a recordable medium which serves to 
define the method according to the present invention or when loaded onto a computer 
forms an apparatus according to the present invention are aspects of the present 
invention. 



9 



35 



CLAIMS 

1. An audio and/or video generation apparatus which is arranged in operation to 
generate audio and/or video signals representative of an audio and/or visual source, 
said audio and/or video generation apparatus comprising 

5 - a recording means which is arranged in operation to record said audio and/or 

video signals on a recording medium, wherein 

- said audio and/or video generation apparatus is arranged to receive metadata 
associated with said audio and/or video signals generated by a data processor, said 
recording means being arranged in operation to record said metadata on said, recording 

10 medium with said audio and/or video signals. 

2. An audio and/or video generation apparatus as claimed in Claim 1, comprising 
an interface having a predetermined format for connecting said data processor to said 
audio and/or video generation apparatus, whereby said generation apparatus is 

1 5 arranged to receive said metadata. 

3. An audio and/or video generation apparatus as claimed in Claims 1 or 2, 
wherein said data processor is arranged to detect signals representative of the time code 
of the recorded audio/video signals, and said metadata includes time code data 

20 representative of the in and out points of a take of the audio/video signals generated by 
said data processor. 

4. An audio and/or video generation apparatus as claimed in Claims 1, 2 or 3, 
wherein said metadata is a unique identification code for identifying the audio/video 

25 signals. 

5 . An audio and/or video generation apparatus as claimed in Claim 4, wherein the 
unique identification code is"a UMID or the like. 
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6 An audio and/or video generation apparatus which is arranged in operation to 
gauerate audio and/or video signals representative of an audio and/or visua, source, 
said audio and/or video generation apparatus comprising 

. a data processor which is arranged in operation to detect ttme codes 
5 associated with said audio and/or video signals and «o store data being representative 
of said time codes associated with leastpari of said audio/video signal i„ a data store. 

, An audio and/or video generation apparatus as claimed in Claitn 6. wherein 
said time code data is representative ofthe dare codes a, an in point and an aa, potn, o. 
10 said at least part ofthe audio/video signals. 

9 audio and/or video generation apparatus as claimed in Claim 6 o, 7, 

wheretn said metadata includes a unique iden.ifoa.ion code for identifying the 
audio/video signals. 

10 . An audio and/or video generation appara,us aa Calmed in Cairn 9, wherein the 
unique identification code is a UMID or the like. 

, , A metadata generation too. which is ranged in operation to receive audio 
and/or video signals representative of an audio and/or visua! source, and to generate 
„,«ada<a associated with said audio and/or video signals, said generation apparatus 

C ° mPn -Tdata processor which is arranged in operation no generate said metadata in 
' .spouse to said audio and,or video signs* and to *>re said metadata associated w,m 
« least par, of said audio/video signals in a data stare, wherein said data processor , 
a^ged in operation tode.ee, rime codes associared with said audio and/or vteo 
signals, said generated metadata ^r^^^ ^^.. 
■ with : least part of said audio/video signals. ■ 
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12. A metadata generation tool as claimed in Claim 1 1. wherein said metadata is 
representative of the time codes at an in point and an out point of said at least part of 
the audio/video signals. 

5 13. A metadata generation tool as claimed in Claim 11 or 12, wherein said 
metadata includes a unique identification code for identifying the audio/video signals. 

14. A metadata generation tool as claimed in Claim 13, wherein the unique 
identification code is a UMID or the like. 

10 

15. A metadata generation tool, wherein said audio/video signals are representative 
of items of audio/video materiel, and data processor operates to generate a log of said 
time code data for each of said items of audio/video material. 

15 16. A metadata generation tool as claimed in Claim 15, comprising a data store 
wherein said data processor is arranged in operation to store said log in said data store. 

17. A method of generating audio/video signals comprising the steps of 

- generating audio and/or video signals representative of an audio and/or visual 

20 source, 

- recording said audio and/or video signals on a recording medium, 

- generating from said audio and/or video signals metadata describing said 
audio and/or video signals and 

- storing said metadata. 

25 

18. A method of generating as claimed in Claim 17, wherein the step of storing 
said metadata comprises the step of 

- • - recording said-metadata on. said recording medium with said audio and/or. 

- r video signals: — - -. — ~ :~: ;• : 

30 
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„. A method of generating . claimed in Claim ,7. .herein the step of sroring 

said metadata comprises the step of 

. sroring said metadata in a data store separate ,0 said audio and/or vrdeo 

signals. 

3 j • „,, nf riaims 17 18 or 19, wherein the 

20. A method of generating as claimed m any of Claims /, 

step of generating said metadata comprises the steps of 

. generating time codes identifying a location on said recording med.um where 
said audio/video signals are recorded, 
,0 - detecting the time codes associated with the in and out points of par, or parts 

of said audio/video signals, and 

- forming said metadata from said detected in and out points. 
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, , A computer program providing computer execute instructions, wh.ch when 
loaded on .0 a dam processor configures said data processor to operate as an audro 
video generation apparatus as claimed in any of Ciaims ! to 10, or a metadata 
generation tool as claimed in any of Claims 1 1 to 16. 

„ A computer program having computer executable instructions, which when 
loaded on to a data processor onuses the dam processor .0 perform fte method 
according to any of Claims 17 to 20. 

,3 A computer program product having a computer readable medium having 
25 "mcorded thereon information signals representative of the computer program Canned 
in any of Claims 21 or 22, 

24 . M audio-and/or video generation apparent* as herein before described «<h 

reference to the accompanying drawings. 

30 
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25. A metadata generation too! as herein before described with reference to 
accompanying drawimgs. 

25. A method of generating audio and/or video signals as herein before descri 
with reference to the accompanying drawings. 
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